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-Abstract- 

Priced timed games are optimal-cost reachability games played between two players—the controller 
and the environment—by moving a token along the edges of infinite graphs of configurations of 
priced timed automata. The goal of the controller is to reach a given set of target locations as 
cheaply as possible, while the goal of the environment is the opposite. Priced timed games are 
known to be undecidable for timed automata with 3 or more clocks, while they are known to be 
decidable for automata with 1 clock. In an attempt to recover decidability for priced timed games 
Bouyer, Markey, and Sankur studied robust priced timed games where the environment has the 
power to slightly perturb delays proposed by the controller. Unfortunately, however, they showed 
that the natural problem of deciding the existence of optimal limit-strategy—optimal strategy of 
the controller where the perturbations tend to vanish in the limit—is undecidable with 10 or more 
clocks. In this paper we revisit this problem and improve our understanding of the decidability 
of these games. We show that the limit-strategy problem is already undecidable for a subclass of 
robust priced timed games with 5 or more clocks. On a positive side, we show the decidability of 
the existence of almost optimal strategies for the same subclass of one-clock robust priced timed 
games by adapting a classical construction by Bouyer at al. for one-clock priced timed games. 
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1 Introduction 

Two-player zero-sum games on priced timed automata provide a mathematically elegant mod¬ 
eling framework for the control-program synthesis problem in real-time systems. In these 
games, two players—the controller and the environment —move a token along the edges of 
the infinite graph of configurations of a timed automaton to construct an infinite execution of 
the automaton in order to optimize a given performance criterion. The optimal strategy of the 
controller in such game then corresponds to control-program with the optimal performance. 
By priced timed games (PTGs) we refer to such games on priced timed automata with optimal 
reachability-cost objective. The problem of deciding the existence of the optimal controller 
strategy in PTGs is undecidable |Hj with 3 or more clocks, while it is known to be decidable [5] 
for automata with 1 clock. Also, the £-optimal strategies can be computed for priced timed 
games under the non-Zeno assumption ESI- Unfortunately, however, the optimal controller 
strategies obtained as a result of solving games on timed automata may not be physically 
realizable due to unrealistic assumptions made in the modeling using timed automata, re¬ 
garding the capability of the controller in enforcing precise delays. This severely limits the 
application of priced timed games in control-program synthesis for real-time systems. 

In order to overcome this limitation, Bouyer, Markey, and Sankur [7] argued the need for 
considering the existence of robust optimal strategies and introduced two different robustness 
semantics— excess and conservative —in priced timed games. The key assumption in their 
modeling is that the controller may not be able to apply an action at the exact time delays 
suggested by the optimal strategy. This phenomenon is modeled as a perturbation game where 
the time delay suggested by the controller can be perturbed by a bounded quantity. Notice 
that such a perturbation may result in the guard of the corresponding action being disabled. 
In the conservative semantics, it is the controller’s responsibility to make sure that the guards 
are satisfied after the perturbation. On the other hand, in the excess semantics, the controller 
is supposed to make sure that the guard is satisfied before the perturbation: an action can 
be executed even when its guard is disabled (“excess”) post perturbation and the valuations 
post perturbation will be reflected in the next state. The game based characterization for ro¬ 
bustness in timed automata under “excess” semantics was first proposed by Bouyer, Markey, 
and Sankur |B] where they study the parameterized robust (qualitative) reachability prob¬ 
lem and show it to be EXPTIME-complete. The “conservative” semantics were studied for 
reachability and Biichi objectives in m and shown to be PSPACE-complete. For a detailed 
survey on robustness in timed setting we refer to an excellent survey by Markey m- 

Bouyer, Markey, and Sankur [7] showed that the problem for deciding the existence of 
the optimal strategy is undecidable for priced timed games with 10 or more clocks under the 
excess semantics. In this paper we further improve the understanding of the decidability of 
these games. However, to keep the presentation simple, we restrict our attention to turn-based 
games under excess semantics. To further generalize the setting, we permit both positive and 
negative price rates with the restriction that the accumulated cost in any cycle is non-negative 
(akin to the standard no-negative-cycle restriction in shortest path game problems on finite 
graphs). We improve the undecidability result of [Tj by proving that optimal reachability 
remains undecidable for robust priced timed automata with 5 clocks. Our second key result 
is that, for a fixed S, the cost optimal reachability problem for one clock priced timed games 
with no-negative-cycle restriction is decidable for robust priced timed games with given bound 
on perturbations. To the best of our knowledge, this is the first decidability result known 
for robust timed games under the excess semantics. A closely related result is [5], where 
decidability is shown for robust timed games under the conservative semantics for a fixed S. 

2 Preliminaries 

We write R for the set of reals and Z for the set of integers. Let C be a finite set of real-valued 
variables called clocks. A valuation on C is a function v : C —> R. We assume an arbitrary 
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but fixed ordering on the clocks and write x,; for the clock with order i. This allows us to 
treat a valuation v as a point (i/(xi), v(x 2 ),..., v{x n )) £ Rl c L Abusing notations slightly, 
we use a valuation on C and a point in Rl c l interchangeably. For a subset of clocks A C C 
and valuation v £ Rl c l, we write i/[X:= 0 ] for the valuation where zx[A:= 0 ](x) = 0 if x £ X, 
and v[X:= 0 \(x) = v(x) otherwise. The valuation 0 £ Rl c l is a special valuation such that 
0 (x) = 0 for all x £ C. A clock constraint over C is a subset of Rl c l. We say that a constraint 
is rectangular if it is a conjunction of a finite set of constraints of the form x ex] k, where 
k £ Z, x £ C, and cx]£ {<, <, =, >, >}. For a constraint g £ <p(C), we write [5] for the set of 
valuations in Rl c l satisfying g. We write ip(C) for the set of rectangular constraints over C. 
We use the terms constraints and guards interchangeably. 

Following jjjjT we introduce priced timed games with external cost function on target loc¬ 
ations (see Appendix [A]). For this purpose, we define a cost function [ 5 ] as a piecewise affine 
continuous function / : R> 0 ->KU {+cx), —00}. We write T for the set of all cost functions. 

► Definition 1 (Priced Timed Games). A turn-based two player priced timed game is a tuple 
Q = (Li, L 2 , Linit,C, X, 77, T, fgoai) where Li is a finite set of locations of Player i , Li n n C 
Li U L 2 (let Li U L 2 = L) is a set of initial locations, C is an (ordered) set of clocks , X C 
L x <p(C) x 2 C x (fUT) is the transition relation , 77: L —> Z is the price function, T is the 
set of target locations, T D L = 0 ; and f goa i : T —> T assigns external cost functions to target 
locations. 

We refer to Player 1 as the controller and Player 2 as the environment. A priced timed game 
begins with a token placed on some initial location £ with valuation 0 and cost accumulated 
being so far being 0 . At each round, the player who controls the current location £ chooses a 
delay t (to be elapsed in l) and an outgoing transition e = (£, g , r, £') £ A to be taken after t 
delay at £. The clock valuation is then updated according to the delay t, the reset r, the cost 
is incremented by t?(£) ■ t and the token is moved to the location £'. The two players continue 
moving the token in this fashion, and give rise to a sequence of locations and transitions called 
a play of the game. A configuration or state of a PTG is a tuple (£, zx, c) where £ £ L is a 
location, zx £ M.l c l is a valuation, and c is the cost accumulated from the start of the play. We 
assume, w.l.o.g [ 23 , that the clock valuations are bounded. 

► Definition 2 (PTG semantics). The semantics of a PTG Q is a labelled state-transition 
game arena [£/] = (S = S} l±) 62, Si n u, A, E, 7r, k) where 

h Sj = Lj x Rl c l are the Player j states with S = Si l±l £>2, 

™ Sinit Q S are initial states s.t. (£, v) £ Si n u if £ £ v = 0, 

h A = R> 0 x A is the set of timed moves , 

h E : (S x A) —> S is the transition function s.t. for s = ^,v),s' = ( £',is')£S and r = 
(t,e) £ A the function E(s,t) is defined if e = (£, g,r, £') is a transition of the PTG and 
v £ [<7]; moreover E(s,t) = s' if z/ = (v + 1 ) [r:= 0 ] (we write s s' when E(s,t) = s'); 
_ 7r : 5 x A-> R is the price function such that 7r((£, v), (£, e)) = ??(£) • t; and 
h k : S —> R is an external cost function such that k(£, v) is defined when £ £ T such that 

V) = fgoal{f)W)- 

A play p = (so, ti, Si,T2, ..., s n ) is a finite sequence of states and actions s.t. sq £ Sinit 
and Si - ‘ +1 > Sj + i for all 0 < i < n. The infinite plays are defined in an analogous manner. 
For a finite play p we write its last state as last(p) = s n . For a (infinite or finite) play p 
we write stop(p) for the index of first target state and if it doesn’t visit a target state then 
stop(p) = 00. We denote the set of plays as Plays e . For a play p = (so, (fi, ai), Si, ( t 2 , 02),...) 
if stop(p) = n < 00 then Costg(p) = k(s„) + ^(sj-i, (U,ai)) else Costg(p) = +00. 

A strategy of player j in Q is a function cr : Plays^ —> A such that for a play p the function 
a(p) is defined if last(p) £ Sj. We say that a strategy cr is memoryless if cr(p) = cr(p') when 
last(p) = last(p'), otherwise we call it memoryful. We write Strati and Strat2 for the set of 
strategies of player 1 and 2, respectively. 

A play p is said to be compatible to a strategy cr of player j £ { 1 , 2 } if for every state Si in 
p that belongs to Player j, s, (+ i = <r(sj). Given a pair of strategies (01,02) € Strati x Strat 2 , 
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and a state s, the outcome of ( 01 , 02 ) from s denoted Outcome(s, ay, 02 ) is the unique play 
that starts at s and is compatible with both strategies. Given a player 1 strategy oq £ Strati 
we define its cost Costg(s, oq) as sup 0 . 2gStrat2 (Cost(Outcome(s, ay, 02 ))). We now define the 
optimal reachability-cost for Player 1 from a state s as 

OptCost 6 (s) = inf sup (Cost(Outcome(s, oq, oq)))- 
CiGStrati C r 2 gStrat 2 

A strategy ay £ Strati is said to be optimal from s if Costg(s,oq) = OptCostg(s). Since the 
optimal strategies may not always exist [5] we define e optimal strategies. For e > 0 a strategy 
a e £ Strati is called e-optimal if OptCostg(s) < Cost e (s, oq) < OptCost e (s) + e. Given a PTG 
Q and a bound K £ Z, the cost-optimal reachability problem for PTGs is to decide whether 
there exists a strategy for player 1 such that OptCost e (s) < K from some starting state s. 

► Theorem 3 ((3j). Cost-optimal reachability problem is undecidable for PTGs with 3 clocks. 

► Theorem 4 ((5] (TO. 12]). The e-optimal strategy is computable for 1 clock PTGs. 

3 Robust Semantics 

Under the robust semantics of priced timed games the environment player—also called as 
the perturbator -is more privileged as it has the power to perturb any delay chosen by the 
controller by an amount in [—<5, 5], where S > 0 is a pre-defined bounded quantity. However, 
in order to ensure time-divergence there is a restriction that the time delay at all locations of 
the RPTG must be > 6. There are the following two perturbation semantics as defined in [7]. 

m Excess semantics. At any controller location, the time delay t chosen by the controller 
is altered to some t' £ [i — 5, i + <5] by the perturbator. However, the constraints on 
the outgoing transitions of the controller locations are evaluated with respect to the time 
elapse t chosen by the controller. If the constraint is satisfied with respect to t , then the 
values of all variables which are not reset on the transition are updated with respect to 
t'\ the variables which are reset obtain value 0 . 

m Conservative semantics. In this, the constraints on the outgoing transitions are evaluated 
with respect to t!. 

In both semantics, the delays chosen by perturbator at his locations are not altered, and the 
constraints on outgoing transitions are evaluated in the usual way, as in PTG. 

A Robust-Priced Timed Automata (RPTA) is an RPTG which has only controller loc¬ 
ations. At all these locations, for any time delay t chosen by controller, perturbator can 
implicitely perturb f by a quantity in [—A, A]. The excess as well as the conservative per¬ 
turbation semantics for RPTA are defined in the same way as in the RPTG. Note that our 
RPTA coincides with that of [7j when the cost functions at all target locations are of the form 
cf : R " 0 —> {0}. Our RPTG are turn-based, and have cost funtions at the targets, while 
RPTGs studied in |7] are concurrent. 

► Definition 5 (Excess Perturbation Semantics). Let TZ = (Li, L 2 , L init , C, A, r], T, f goa i) be 
a RPTG. Given a S > 0, the excess perturbation semantics of RPTG TZ is a LTS [7?.] = 
(5, A, E ) where S = Si U S 2 U (T x M> 0 ), A = Ai U A 2 and E = E 1 U E 2 . We define the set 
of states, actions and transitions for each player below. 

h S\ = Li x Rl c l are the controller states, 

m S 2 = {L 2 X R |C| ) U (S 1 ! X M >0 x A') are the perturbator states. The first kind of states are 
encountered at perturbator locations. The second kind of states are encountered when 
controller chooses a delay t £ R>o and a transition e £ X at a controller location. 

h A\ = M>o x X are controller actions 

m A 2 = (R>o x A) U [—5, <5] are perturbator actions. The first kind of actions (M>o x A) are 
chosen at states of the form L 2 x Rl c l £ S 2 , while the second kind of actions are chosen 
at states of the form S\ x R > 0 x A £ S 2 , 
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h Ei = (.Si x A\ x S 2 ) is the set of controller transitions such that for a controller state 

(Z, u) and a controller action (f,e)) is defined iff there is a transition e = 

(l, g, a, r, l') in 72 such that v + t £ [g]. 

h E 2 = S 2 x A 2 x (Si U S 2 U (T x R>o)) is the set of perturbator transitions such that 

_ For a perturbator state of the type (Z,i/) and a perturbator action (t, e), we have 
(Z 7 , v’) = E 2 HI, v), ( t , e)) iff there is a transition e = (Z, g, a, r, V) in 7 Z such that v + t £ 
[g], 1 / = (v + t)[r := 0 ], 

- For a perturbator state of type ((Z, v),t, e) and a perturbator action e £ [—5, <5], we have 
(l', v') = E 2 (((l, v),t, e),e) iff e = (Z,g, a, r, l'), and 1 / = {v + t + e)[r := 0]. 

We now define the cost of the transitions, denoted as Cost(f, e) as follows : 

(t,e) 

h For controller transitions : (l,v) ’ > ((l,i/),t,e) : the cost accumulated is Cost(Z, e) = 0. 

h For perturbator transitions : 

t 6 

- From perturbator states of type (l,v) : {l,v) —the cost accumulated is 
Cost(f, e) = t* r](l). 

m From perturbator states of type ((l, v),t, e) : ((l, 1 /), t, e) (Z 7 , the cost accumulated 

is (t + e) * r](l). Note that although this transition has no edge choice involved and the 
perturbation delay chosen is e G [—<5, <5], the controller action (t,e) chosen in the state 
(l, v) comes into effect in this transition. Hence for the sake of uniformity, we denote 
the cost accumulated in this transition to be Cost (t + e, e) = (t + e) * rj(l). 

Note that we check satisfiability of the constraint g before the perturbation; however, the 
reset occurs after the perturbation. The notions of a path and a winning play are the 
same as in PTG. We shall now adapt the definitions of cost of a play, and a strategy 
for the excess perturbation semantics. Let p = (si, (ti, ei), S 2 , (Z 2 , £ 2 ), • • • (t n -i, e„_i), s n ) 
be a path in the LTS [72.]. Given a 6 > 0, for a finite play p ending in target loca¬ 
tion, we define Cost^(p) = ^]” = i Cost(Zi, ej) + f g0 ai{^n){ v n) as the sum of the costs of all 
transitions as defined above along with the value from the cost function of the target loc¬ 
ation l n . Also, we re-define the cost of a strategy <j\ from a state s for a given S > 0 as 
Cost^(s,CTi) = sup CT 2 e strat 2 {Ti) Cost^(Outcome(s, <Ti, cr 2 )). Similarly, OptCost^ is the optimal 
cost under excess perturbation semantics for a given <5 > 0 defined as 

OptCost^(s) = inf sup (Cost^(Outcome(s, 01 , o- 2 )))- 

o-iGStratifR) CT 26 S trat 2 ( 7 ^) 

Since optimal strategies may not always exist, we define e—optimal strategies such that for 
every e > 0, OptCost^(s) < Cost^(s, err) < OptCost^(s) + e. Given a S and a RPTG 7 Z 
with a single clock x, a strategy a-\ is called (e, N) — acceptable [S] for e > 0, N £ N when 
(l)it is memoryless, (2)it is e—optimal and (3)there exist N consecutive intervals ( Ii)i<i<N 
partitioning [0,1] such that for every location l, for every 1 <i<N and every integer a < M 
(where M is the maximum bound on the clock value), the function that maps the clock values 
v(x) to the cost of the strategy o\ at every state (l,v(x)), (v(x) Cost^((Z, i/(x)), cti)) is 
affine for every interval a + Ii. Also, the strategy o\ is constant over the values a + Ii at all 
locations, that is, when v{x) £ a + , the strategy cr 1 (Z, v(x)) is constant. The number N is 

an important attribute of the strategy as it establishes that the strategy does not fluctuate 
infinitely often and is implementable. 

Now, we shall define limit variations of costs, strategies and values as <5 —► 0. The limit- 
cost of a controller strategy a± from state s is defined over all plays p starting from s that 
are compatible with o\ as: 

LimCost-R,(s, cti) = lim sup Cost^(Outcome(s, ay, 02 )). 

a 2 eStrat 2 (TZ) 

The limit strategy upper-bound problem [7] for excess perturbation semantics asks, given a 
RPTG 72, state s = (Z, 0) with cost 0 and a rational number K, whether there exists a 
strategy o\ such that LimCost-R,(s, ay) < I\. The following are the main results of [7]. 
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► Theorem 6 (Known results [7]). 1. The limit-strategy upper-hound, problem is unde- 
cidable for RPTA and RPTG under excess perturbation semantics, for > 10 clocks. 
2. For a fixed S £ [0, |], and a given RPTA A, a target location l and a rational K, it is 
undecidable whether inf^ sup CT , cost ai ^ 2 (p) < K such that p ends in l. cost a . 1 ^ 2 (p) 
is the cost of the unique run p obtained from the pair of strategies ( 01 , 02 ). 

We consider a semantic subclass of RPTGs in which the accumulated cost of any cycle 
is non-negative: that is, any iteration of a cycle will always have a non-negative cost. 
Consider the two cycles depicted. The one on top has a non-negative cost, while the one 
below always has a negative cost. In the cycle below, the perturbator will not perturb, 
since that will lead to a target state. In the rest of the paper, we consider this semantic class 
of RPTGs (RPTAs), and prove decidability and undecidability results; however, we will refer 
to them as RPTGs(RPTAs). Our key contributions are the following theorems. 

► Theorem 7. The limit-strategy upper-bound problem is undecidable for RPTA with 5 clocks, 
location prices in { 0 , 1 }, and cost functions cf :R >0 —>• { 0 } at all target locations. 

► Theorem 8. Given a 1-clock RPTG TZ and a S > 0, we can compute OptCost^(s) for 
every state s = (l, v). For every e > 0, there exists an N £ N such that the controller has an 
(e, N)-acceptable strategy. 

The rest of the paper is devoted to the proof sketches of these two theorems, while we give 
detailed proofs in the appendix. 

4 Undecidability with 5 clocks 

In this section, we improve the result of [7] by showing that the limit strategy upper bound 
problem is undecidable for robust priced timed automata with 5 or more clocks. The undecid¬ 
ability result is obtained using a reduction to the halting problem of two-counter machines. 

A two-counter machine has counters C\ and C 2 , and a list of instructions I\, 1%, ■■ ■, 
where I n is the halt instruction. For each l < i < n — 1, /, is one of the following instructions: 
increment Q,: Cb '■= Cb + 1 ; goto Ij , for b = 1 or 2 , decrement q, with zero test: 
if (c b = 0) goto Ij else Cb := Q, — 1 ; goto Ij, where Ci,C 2 represent the counter values. 
The initial values of both counters are 0. Given the initial configuration (/i,0,0) the halting 
problem for two counter machines is to find if the configuration {I n ,c 1 , 02 ) is reachable, with 
ci,C 2 > 0. This problem is known to be undecidable. 

We simulate the two counter machine using a RPTA with 5 clocks Xi, z, X 2 , yi and y 2 
under the excess perturbation semantics. The counters are encoded in clocks X\ and z as 
£1 = fj + £1 and 2 = A- + £2 where i,j are respectively the values of counters C\, C 2 , and £1 
and £2 denote accumulated values due to possible perturbations. Clocks £ 2 , J/i and 1/2 help 
with the rough work. The simulation is achieved as follows: for each instruction, we have 
a module simulating it. Upon entering the module, the clocks are in their normal form i.e. 
x\ = ft + £ 1 , z = A + £2 and £2 = 0 and y\= yi = 0 . 



4.1 Increment module 

The module in Figure [T] simulates the increment of counter C%. The value of counter C 2 
remains unchanged since the value of clock z remains unchanged at the exit from the module. 
Upon entering A the clock values are X\ = Jj + E\,z = ^ + £ 2 , £2 = Vi = 2/2 = 0. Here £\ 
and £2 respectively denote the perturbations accumulated so far. We denote by a, the value 
of clock x\, i.e. A + e 1 . Thus at A, the delay is 1 — a. Note that the dashed edges are 
unperturbed (this is a short hand notation. A small gadget that implements this is described 
in Appendix |B]), so X\ = 1 on entering B. No time elapse happens at B , and at C, controller 
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Figure 1 Increment Ci module : The module keeps the fractional part of the clock z un¬ 
changed. The dashed edges represent unperturbed edges (detailed in Appendix |Bl. 


chooses a delay t. This t must be ^ to simulate the increment correctly, t can be perturbed 
by an amount <5 by the perurbator, where S can be both positive or negative, obtaining 
x 2 = t + 6, Xi = 0, ?/i = 1 — a + t + 5 on entering D. At D, the delay is a — t — S. Thus the 
total delay from the entry point A in this module to the mChoice module is 1 time unit. At 
the entry of the mChoice ( mChoice and Restore modules are in Appendix [B|) module, the 
clock values are x\ = a — t — S, z = 1 + ^ + e 2 , x 2 = a, yi = 1, y 2 = 0. To correctly simulate 
the increment of Ci, t should be exactly f. 

At the mChoice module, perturbator can either continue the simulation (by going through 
the Restore module) or verify the correctness of controller’s delay (check t = f). The mChoice 
module adds 3 units to the values of xi,x 2 and z, and resets yi,y 2 . Due to the mChoice 
module, the clock values are X\ = 3 + a — t — S, z = 4 + i + e 2 , x 2 = 3 + a, y\ = l,y 2 =0. 
If perturbator chooses to continue the simulation, then Restore module brings all the clocks 
back to normal form. Hence upon entering F, the clock values are X\ = a — t — 8, z = 
A + £ 2 ,x 2 = t/i = 1, t /2 = 0. This value of X\ is j + ei, since t = f and E\ = —S, the 
perturbation effect. 

Let us now see how perturbator verifies t = ^ by entering the Choice module. The 
Choice module also adds 3 units to the values of x\,x 2 and z, and resets yi,y 2 . The module 
Test Inc is invoked to check if t > and the module Test Inc is invoked to check if 
t < Note that using the mChoice module and the Choice module one after the other, the 
clock values upon entering Test Inc or Test Inc^ 1 are X\ = 6+a— t—6, z = 7+ J^+e 2 , x 2 = 
6 + a, yi = 0,y 2 = 0. 

Test Inc^ 1 : The delay at A' is 1 — a + t + 5, obtaining x 2 = 7 + t + 6, and the cost 
accumulated is 1 — a +1 + S. At B', l^t — S time is spent, obtaining X\ = 1 — t — S. Finally, 
at C 1 , a time t + S is spent, and at D', one time unit, making the total cost accumulated 
2 — a + 2t + 25 at the target location. The cost function at the target assigns the cost 0 for 
all valuations, hence the total cost to reach the target is 2 + 2t — a + 28 which is greater than 
2 + 28 iff 2 t — a > 0 , i.e. iff t > |. 

► Lemma 9. Assume that an increment Ct (b £ {0,1},) module is entered with the clock 
valuations in their normal forms. Then controller has a strategy to reach either location lj 
corresponding to instruction Ij of the two-counter machine or a target location is reached with 
cost at most 2 + \25\, where S is the perturbation added by perturbator. 


4.2 Complete Reduction 

The entire reduction consists of constructing a module corresponding to each instruction /j, 
1 < i < n, of the two-counter machine. The first location of the module corresponding to 
instruction I\ is the initial location. We simulate the halting instruction I n by a target location 
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with cost function cf : ®->o —> {0}. We denote the robust timed automaton simulating the 
two counter machine by A , s is the initial state (1,0,0). 

► Lemma 10. The two counter machine halts if and only if there is a strategy a of controller 
such that limcost^(a, s) < 2. 

The details of the decrement and zero test modules are in Appendix [B] They are similar 
to the increment module; if player 2 desires to verify the correctness of player l’s simulation, 
a cost > 2 + |25| is accumulated on reaching a target location iff player 1 cheats. In the 
limit, as S —> 0, the limcost will be > 2 iff controller cheats. The other possibility to obtain 
a limcost > 2 is when the two counter machine does not halt. 


5 Decidability of One-clock RPTG 


— A Dwell-time PTG -i 


A 

& 


x < 2 

dr 

x < 1 


B 

1 

M] 


In order to show the decidability of the optimal reachability game for 1 clock RPTG TZ 
and a fixed S > 0, we perform a series of reachability and optimal cost preserving trans¬ 
formations. The idea is to reduce the RPTG into a simpler priced timed game, while 
preserving the optimal costs. The advantages of this conversion is that the semantics of 
PTGs are easier to understand, and one could adapt known algorithms to solve PTGs. 
On the other hand, the PTGs that we obtain are 1-clock PTGs with dwell-time requirement 
(having restrictions on minimum as well as maximum amount of time spent at certain loca¬ 
tions), see for example, a dwell-time PTG with two locations A,B. A minimum of 1 and a 
maximum of two units of time should be spent at A, while a maximum of 3 time units can 
be spent at B. If we wish to model this using standard PTGs, we need one extra clock and 
we can not use the decidability results of 1 clock PTG to show the decidability of our model. 
We show in Section |A4| how to solve 1-clock PTGs with dwell-time requirements. 

Our transformations are as follows: (i) for a given S, our first transformation reduces the 
RPTG TZ into a dwell-time PTG Q (Section 5.1 1 ; (ii) our second transformation restricts to 
dwell-time PTGs where the clock is bounded by 1 + <5. To achieve this, we use a notion 
of fractional resets, and denote these PTGs as Qt (Section |5.2|); (iii) our third and last 


transformation restricts Qjr without resets (Section 5.3 ). The reset-free dwell-time PTG is 
denoted Qj= . For each transformation, we prove that the optimal cost in each state of the 
original game is the same as the optimal cost at some corresponding state of the new game. 
We also show that an (e, A^)-strategy of the original game can be computed from some (e', N')- 
strategy in the new game. The details of each transformation and correctness is established in 
subequent sections. We then solve Qj- employing a technique inspired by [5] while ensuring 
that the robust semantics are satisfied. 


5.1 Transformation 1: RPTG TZ to dwell-time PTG Q 


- TZ and Q - 

A B 

GHK*) 

(Tel- 



Given a one clock RPTG TZ = (L\, L 2 , {cc} , X, rj, T, f goa i) and a S > 0, we 
construct a dwell-time PTG Q = (Li,L 2 A L’,{x} ,X’,rj’,T, f goa {). All the 
controller, perturbator locations of TZ (L\ and L 2 ) are carried over respectively 
as player 1, player 2 locations in Q. In addition, we have some new player 2 
locations L' in Q. The dwell-time PTG Q constructed has dwell-time restrictions 
for the new player 2 locations l!. The locations of L’ are either urgent, or 
have a a dwell-time of [5,25] or [0,5]. All the perturbator transitions of TZ are 
retained as it is in Q. Every transition in TZ from a controller location A to 
some location B is replaced in Q by a game graph as shown. Let e = (A, g, r, B) 
be the transition from a controller location A to a location B with guard g , and 
reset r. Depending on the guard g, in the transformed game graph, we have 
the new guard g'. If g is x = H, then g' is x = H — 5, while if g is H < x < H + 1, then g' is 
H — 5 < x < H + 1 — 5, for H >0. When g is 0 < x < K, then g'is 0<x<K — 5 and x = 0 


[< 5 , 28 } 
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stays unchanged. It can be seen that doing this transformation to all the controller edges of 
a RPTG TZ gives rise to a dwell-time PTG Q. 

Lets consider the transition from A to B in TZ. Assume that the transition from A to B 
(called edge e) had a constraint x = 1, and assume that x = v on entering A. Then, in TZ, 
controller elapses a time 1 — v, and reaches B ; however on reaching B, the value of x is in 
the range [1 — 8, 1 + <5] depending on the perturbation. Also, the cost accumulated at A is 
k * (1 — v + 7 ), where 7 € [—<5, A]. To take into consideration these semantic restrictions of TZ, 
we transform the RPTG TZ into a dwell-time PTG Q. First of all, we change the constraint 
x = 1 into x = 1 — 8 from A (a player 1 location) and enter a new player 2 location {A, e). 
This player 2 location is an urgent location. The correct strategy for player 1 is to spend a 
time 1 — v — 8 at A (corresponding to the time 1 — v he spent at A in TZ). At (A, e), player 2 
can either proceed to one of the player 2 locations (A,e) _ or (A,e) + . The player 2 location 
( A , e) models perturbator’s choices of positive or negative perturbation in TZ. If player 2 goes 
to (A, e)~, then on reaching B, the value of x is in the interval [1 — <5,1] (this corresponds 
to perturbator’s choice of [—<5, 0] in TZ) and if he goes to ( A , e) + , then the value of x at B is 
in the interval [1,1 + 5] (this corresponds to perturbator’s choice of [0,(5] in TZ). The reset 
happening in the transition from A to B in TZ is now done on the transition from ( A , e) to 
B and from (A, e) + to B. Thus, note that the possible ranges of x as well as the accumulated 
cost in TZ while reaching B are preserved in the transformed dwell-time PTG. 

► Lemma 11. Let TZ be a RPTG and Q be the corresponding dwell-time PTG obtained using 
the transformation above. Then for every state s in TZ, OptCost^(s) = OptCostg(s). An 
(e, N)—strategy in TZ can be computed from a (e, N)—strategy in Q and vice versa. 

Proof In Appendix [C] 


5.2 Transformation 2: Dwell-time PTG Q to Dwell-time FRPTG Qjr 



Recall that the locations of the dwell-time PTG Q is Li U L2 U L' where L\ U L2 are 
the set of locations of TZ, and L' are new player 2 locations introduced in Q. In this 
section, we transform the dwell-time PTG Q into a dwell-time PTG Qj= having the 
restriction that the value of x is in [0,1] at all locations corresponding to L\ U L 2 , 
and is in [0,1 + 5] at all locations corresponding to L'. While this transformation 
is the same as that used in [5j, the main difference is that we introduce special 
resets called fractional resets which reset only the integral part of clock x while 
its fractional part is retained. For instance, if the value of x was 1.3, then the 
operation [x] := 0 makes the value of x to be 0.3. 

Given a one clock, dwell-time PTG Q = (Li,L 2 U L', {x} , X, rj, T, f goa i s ) with M 
being the maximum value that can be assumed by clock x, we define a dwell-time 
PTG with fractional resets (FRPTG) Qjr. In Qjr , we have M + 1 copies of the 
locations in L 1 U L2 as well as the locations in L' with dwell time [0,(5], [0,0]. 
These M + 1 copies of L' have the same dwell-time restrictions in Qjr. The copies 
are indexed by i,0 < i < M, capturing the integral part of clock x in Q. Finally, we have in 
Q, the locations of L' with dwell-time restriction [(5,2(5]. For each such location (A, e) + , we 


have in Qjr, the locations (A, e) i 


and (A, e) i+1 


for 0 < i < M. The dwell-time restriction for 


(A, e)+ is same as (A, e) + , while locations (A, e)“ +1 are urgent. The prices of locations are 
carried over as they are in the various copies. 


The transitions in Qjr consists of the following: (1) li 


(g-i) no<x<i 


( 2 ) k - 

(Ae)t 


(g— i)nO<£c<l;{a;} 


» mo if l 


s;{A 


x>l,[rr]:=0 


> m € X; (3) k 


x=l,{x} 




» mJ 1 if l —> to € X: 


» h+i, for l £ Li U L 2 , and 


» (A, e)°, 1 for i < M. Consider for example, the constraint g' between 


1 


g — i represents the constraint obtained by shifting the constraint by —i 
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A and (A, e) as x = {b + 1) — 5 in Q. Then the value of x is b + (1 — 5) for b < M 
when ( A , e) + is entered in Q. The location ( A , e) + with v(x) = b + (1 — 5) is represented 
in Gf as (A, e)£ with v(x) = 1 — 5. If player 2 spends [5,25] time at (A, e) + in G, then 
v(x) £ [5+1,6+1 +5]. If there are no resets to goto B , then v(x) £ [b + 1, (b + 1) + 5] at 
B. Correspondingly in Gf, v(x) £ [1,1 + 5] at ( A , e) b . By construction, B b is not reachable, 
since we check 0 < x < 1 on the transition to B b . The fractional reset is employed to 
obtain x = 5 while moving to (A, e)°, r This ensures that x = 5 on reaching B b+ \, thereby 
preserving the perturbation, and keeping x < 1. A normal reset would have destroyed the 
value obtained by perturbation. The mapping / between states of Q and Gf is as follows: 
f(l, x) = ( k , x — b), b < M, and x £ [5, b + 1], l £ Li U L 2 , /((A, e),x) = ((A, e) b , x — b ), 
b < M, and x £ [b, b + 1], /((A, e)~ ,x) = ((A, e)f, x — b), b < M, and x £ [b, b + 1], Finally, 
/((A, e) + , x) = ((A, e) b ,x — b), b < M, and x £ [ 6 , 6 + 1] U [5+ 1, b+ 2]. Note that in the last 
case, the value of x — b can exceed 1 but is less than or equal to 1 + 5. 

► Lemma 12. For every state (l, v ) in Q, OptCostg{l , u) in G is the same as OptCostg :F {f{l 1 u)) 
in Gf- For every e > 0, iV £ N, an (e, N)-acceptable strategy in G can be computed from an 
(e, N)-acceptable strategy in Gt and vice versa. 


5.3 Transformation 3: Dwell-time FRPTG Qj? to resetfree FRPTG Qj 


Example 


h 


We now apply the final transformation to the FRPTG Gt and construct a reset-free version 
of the FRPTG denoted Gt ■ Assume that there are a total of n resets (including fractional re¬ 
sets) in the FRPTG. Qj:' consists of n+ 1 copies of the FRPTG : Gj -o, Gj= i, • ■ •, Gtu- Given the 
locations L of the FRPTG, the locations of Gn are L 1 , 0 < i < n. Gro starts with 1°, where l 
is the initial location of the FRPTG and continues until a resetting transition happens. At the 
first resetting transition, Gfq makes a transition to Gf i • The nth copy is directed to a sink tar¬ 
get location S with cost function cf : R>o —> {+oo} on the (n+l)th reset. Note that each Gf% 
is reset-free. One crucial property of each Gf% is that on entering with some value of x in [0,5], 
the value of x only increases as the transitions go along in Gfi \ moreover, x < 1 + 5 in each 
Gfi by construction. The formal details and proof of Lemma 1 13 1 can be found in Appendix |E| 
Using the cost function of S and those of the targets, we compute the optimal cost functions 
for all the locations of the deepest component Gfu- The cost functions of the locations of 
Qf% are used to compute that of Gf%— i, an d so on until the cost function of 1°, the starting 
location of Gfo is computed. An example can be seen in Appendix |F| 



Superimposition 


► Lemma 13. For every state (l,v) in Gf, OptCostg^fv) = OptCostg^t^ 0 ,v), where 
Gf' is the resetfree FRPTG. For every e > 0, N £ N, given an (e, N)-acceptable strategy 
a' in Gf' > we can compute a ( 2 e, N)-acceptable strategy a in Gf and vice versa. 




Exterior 


5.4 Solving the Resetfree FRPTG 

Before we sketch the details, let us introduce some key notations. Observe that after our 
simplifying transformations, the cost functions cf are piece wise-affine continuous functions 
that assign a value to every valuation x £ [0,1 + 5] (construction of FRPTG ensures x<l+5 
always). The interior of two cost functions /i and / 2 is a cost function fo : [0,1 + 5] — > 
R defined by / 3 (x) = min(/Ra;),/ 2 (x)). Similarly, the exterior of f\ and / 2 is a cost 
function f± : [0,1 + 5] — > R defined as f 4 ,(x) = max(/i(i), / 2 (x)). Clearly, f 3 and f± 
are also piecewise-affine continuous. The interior and exterior can be easily computed by 
superimposing fi and / 2 as shown graphically in the example by computing lower envelope 
and upper envelope respectively. 

We now work on the reset-free components Gfi, and give an algorithm to compute 
OptCostg :Fi {l, v) for every state (l, v) of GFi, v(x) £ [0,1 + 5]. We also show the existence 
of an N such that for any e > 0, and every l £ L l , v(x) £ [0,1 + 5], an (e, AQ-acceptable 
strategy can be computed. Consider the location of Gfi that has the smallest price and 
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call it lmin- If this is a player 1 location, then intuitively, player 1 would want to spend 
as much time as possible here, and if this is a player 2 location, then player 2 would want to 
spend as less time as possible here. By our assumption, all the cycles in Gyt are non-negative, 
and hence if l m in is part of a cycle, revisiting it will only increase the total cost if at all. Player 
1 thus would like to spend all the time he wants to during the first visit itself. We now prove 
that this is indeed the case. We consider two cases separately. 



5.4.1 l min is a Player 1 location 

We split Qy { such that l m in is visited only once. We transform Gyi into Qjr" which has two 
copies of all locations except l m i n such that corresponding to every location l ^ l m in , we have 
the copies (7,0) and (Z,l). A special target location S is added with cost function assigning 
+00 to all clock valuations. 

Given the transitions X of Gyi, the FRPTG Gy" has the following transitions. 

if l 4 V £ A and l, l' ± l min then (l, 0 ) 4 (l', 0 ) and (l, 1 ) 4 (l', 1 ) 
if l l r £ X and l 1 = l m in then (l, 0) A- l m in and (l, 1) A- S, 


°rmn 

b min ■MU) 


™ if lmin ^ l-> then l rt 
► Lemma 14. For every state (l,v) if v£[0,l + 6] and l^l m im we have that 
OptCostg^ffu) = OptCostg „((l, 0), v) and OptCostg^ (l m i n ,u) = OptCostg^n^r, 


We give an intuition for Lemma 14 


Locations (Z, 0) have all the transitions avail¬ 
able to location l in Gn- Also, any play in Qj= which is compatible with a winning 
strategy of player 1 in Gy » contains only one of the locations (Z, 0), (Z, 1) by con¬ 
struction of Gy"■ The outcomes from (1,0) are more favourable than (Z, 1) for l as 
a player 1 location. Based on these intuitions, we conclude that OptCost g (l,v) is 
same as that for ((l,0),i/). This observation also leads to the e—optimal strategy 
being the same as that for (Z, 0). Given a strategy a' in Gy", we construct a in Qj^ i 
as cr(l,i') = (J / ((l, 0), id) . Further, any strategy that revisits l m in in Gyi cannot be win¬ 
ning for player 1, since all cycles are non-negative; we end up at S with cost oo in Gy ■ 
However, all strategies that do not revisit l m i n in Gyi are preserved in Gy , and hence 
OptCost g (lmin, v) = OptCost g "(l m in,v)- We iteratively solve the part of Gy" with loca¬ 
tions indexed 1 (i.e; (l, 1 )) in the same fashion (picking minimal price locations) each time 
obtaining a smaller PTG. Computing the cost function of the minimal price location of the 
last such PTG, and propagating this backward, we compute the cost function of Z min . We 
then use the cost function of l m in to solve the part of Gy" with locations indexed 0 (i.e; (l, 0 )). 
Computing the Optcost function of l m in- Algorithm [T] computes the optcost function 
for a player 1 location l m i n , assuming all the constraints on outgoing transitions from l m i n 
are the same, namely x £ [0,1]. We discuss adapting the algorithm to work for transitions 
with different constraints in Appendix [G] A few words on the notation used: if a location l 
has price r/(l), then slope associated with l is — rj(l) (see STEP 3 in Algorithm |T|) . 

Let li,... ,l n be the successors of l m i„, with cost functions f\,, f n . Each of these cost 
functions are piecewise affine continuous over the domain [0,1]. The first thing to do is 
to superimpose f \,, f n , and obtain the cost function / corresponding to the interior of 
fi,..., f n (l m in is a player 1 location and would like to obtain the minimal cost, hence the 
interior). The line segments comprising / come from the various fi. Let dom(f) = [0,1] be 
composed of 0 = u ix < v tl = u i2 < .. .u im < v im = 1 : that is, f(x) = fi d (x), dom(f ij ) = 
[u-i ,, Vi ,], for ij £ {1,2,..., n} and 1 < j < m. Let us denote fi . by gj , for 1 < j < m. Then, 
/ is composed of 51 , 52 , ■ • • ,5m, and dom(f) is composed of dom(g 1 ),... ,dom(g m ) from left 
to right. Let dom(gi) = \ui,Vj\. Step 2 of the algorithm achieves this. 

For a given valuation v(x), if l m in is an urgent location, then player 1 would go to a 
location Ik if the interior / is such that f(u(x)) = gk(v(. r))(the least cost is given by gk , 
obtained from the outside cost function of Ik). If lmin is not an urgent location, then player 1 
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Algorithm 1 : Optimal Cost Algorithm when is a Player 1 location 

Let l\,l n be the successors of l m in with optcost functions /i, fi ■ ■ ■ /„.; 

STEP 1 : Superimpose : Superimpose all the optcost functions /i, ■ ■ ■ /„.; 

STEP 2 : Interior : Take the interior of the superimposition; call it /.; 

Let / be composed of line segments gi, gi • • • g m such that gi £ {/ 1 ,..., /„}, for all i. 
V k , let the domain of gk be [uk,Vk]- Set i = to.; 

STEP 3 : Selective Replacement : while i > 1 do 
if slope of gt < -r)(l min ) then 

replace gi with line hi with slope —rj(l m in) and passing through (-Uj, gj(wj)); 
Let hi intersect gj (largest j < i) at some point x = v'J, v " £ [uj, vf]\ 

Update domain of gj from [uj,Vj] to [uj,v"]-, 

if j < i — 1 then 

|_ Remove functions gj+\ to gi-\ from / 

Set % = j; 

else 

L * = *-!; 

STEP 4 : Refresh Interior : Take the interior after STEP 3 and call it /'.; 
if l" -£ l min then 
_ update the optcost function of l" 


would prefer delaying t units at l m in so that v(x) +1 £ [m , vf\ rather than goto some location 
li if gi(y{x)) > r)(l m in)(vi — v{x)). Again, gi is a part of the ouside cost function of and 
player 1 prefers delaying time at l m in rather than goto U since that minimizes the cost. In this 
case, the cost function / is refined by replacing the line segment g,; over [ui, vf\ by another line 
segment hi passing through (vi, gi(vi)), and having a slope —Step 3 of the algorithm 
does this. 

Recall that by our transformation 2, the value of clock x in any player 1 location is < 1 — S. 
The value of x is in [1 — 5 ,1 + 5] only at a player 2 location ((A, e)b + in the FRPTG, section 


not [0,1 + (5J. Let the domain of g m be [u m ,l]. Then we can split g m into two functions 
9rni9m with domains [u m , 1 — J] and [1 — <5,1], Now, we ensure that no time is spent in the 
player 1 location l m i n over dom(g^ n ), by not applying step 3 of the algorithm for g^. This 
way, selective replacement of the cost functions g^ occur only in the domain [ 0,1 — <5], and we 
remain faithful to transformation 2, and the semantics of RPTGs. 

Computing Almost Optimal Strategies: The strategy corresponding to this computed 
optcost is derived as follows. /' is the optcost of location l m i n computed in Step 4 of the 
algorithm, f is composed of two kinds of functions (a) the functions gi computed in step 2 as 
a result of the interior of superimposition and (b) functions hi which replaced some functions 
gj from /, corresponding to delay at l m i n . For functions hj of /' with domain [uj,Vj], we 
prescribe the strategy to delay at l m i n till x = Vj when entered with clock x £ [uj,Vj\. 
For functions g*, that come from / at Step 2, where g^ is part of some optcost function /*., 
(Jk is the optcost function of one of the successors Ik of lmin), the strategy dictates moving 
immediately to Ik when entered with clock x £ [ui,Vi\. 

Termination: Finally, we prove the existence of a number N , the number of affine segments 
that appear in the cost functions of all locations. Start with the resetfree FRPTG with 
to locations having p segments in the outside cost functions. Let a(m,p) denote the total 
number of affine segments appearing in cost functions across all locations. The transformation 
of resetfree components Gj? into Gf" gives rise to two smaller resetfree FRPTGs with to — 1 
locations each, after separating out l m in ■ The resetfree FRPTG (Gj 1) with in — 1 locations 
indexed with 1 of the form (l, 1 ) are solved first, these cost functions are added as outside 
cost functions to solve l m i n , and finally, the cost function of l m i n is added as an outside cost 


5.2 1 . Hence, the domain of cost functions for player 1 locations is actually [0,1 — 5], and 
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function to solve the resetfree FRPTG (Gf, 0) with m— 1 locations indexed with 0 of the form 
(l, 0). Taking into account the new sink target location added, we have < p + 1 segments in 
outside cost functions in (Gf, 1)- This gives atmost /? = a(m —l,p+l) segments in solving 
(Gf, 1 ), and a(l,p + /?) = 7 segments to solve l m i n , and finally a(m — 1 ,p + 7 ) segments to 
solve (Gf,0)- Solving this, one can easily check that a(m,p) is atmost triply exponential in 
the number of locations m of the resetfree component Gt- Obtaining a bound of the number 
of affine segments, it is easy to see that Algorithm 1 terminates; the time taken to compute 
almost optimal strategies and optcost functions is triply exponential. 

We illustrate the computation of Optcost of a Player 1 location in Figure 15] The proof of 


Lemma [15 
Algorithm 


given in Appendix [G] while Lemma [16] follows from Lemma 1 1 5| and Step 4 of 


► Lemma 15. In AlgorithmUj if a function gi (in f of Step 2) has domain [ui,Vi] and slope 
< —rj(l) then OptCost(l , v) = (Vi — v) * rj(l) + g(vi). 

► Lemma 16. The function f in Algorithm^ computes the optcost at any location l. That 
is, \/x G [0,1], OptCostg{l,x ) = /'( x). 


Note that the strategy under construction is a player 1 strategy, and player 1 has no control 
over the interval [1 ,1 + <5]. x € [1,1 + <5] after a positive perturbation, and is under player 2’s 
control. Thus, at a player 1 location, proving for x € [0,1] suffices. 


5.4.2 l min is a Player 2 location 

If lmin is a player 2 location in the reset-free component Q?i, then intuitively, player 2 would 
want to spend as little time as possible there. Keeping this in mind, we first run steps 1, 
2 of Algorithm [l] by taking the exterior of fi,-..,f n instead of the interior(player 2 would 
maximise the cost). There is no time elapse at l m in on running steps 1,2 of the algorithm. 
Let / be the computed exterior using steps 1,2. If / comprises of functions g t having a 
greater slope than —g(l ), then it is better to delay at l m in to increase the cost. In this case, 
player 2 would want to improve his optcost using Step 3, by spending time at l m i n . Finally, 
while doing Step 4, we take the exterior of the replaced functions hi and old functions g t . 
Recall that our transformations resulted in 3 kinds of player 2 locations : urgent, those with 
dwell-time restriction [0,(5] and finally those with [(5, 26]. The 3 cases are discussed in detail 
in Appendix |H| 


6 


Conclusion and Future Work 


In this paper we studied excess robust semantics and provided the first decidability result for 
excess semantics and improved the known undecidability result with 10 clocks to 5 clocks. To 
the best of our knowledge, the other known decidability result for robust timed games is under 
the conservative semantics for a fixed S, [S]. As a consequence of our decidability result, the 
reachability problem for 1 clock PTG with arbitrary prices is shown to be decidable too under 
the assumption that the PTG does not have any negative cost cycle. The decidability we show 
is for a fixed perturbation bound S > 0. We use <5 in the constraints of the dwell-time PTG 
after the first transformation for ease of understanding the robust semantics. Implementing 
this in step 3 of Algorithm 1 and ensuring no time elapse in the interval [1 — 5, 1] takes no 
extra effort while l m in is a player 1 location. In that sense, we could have avoided explicit use 
of 8 in the constraints in our simplifying transformations, and taken the appropriate steps 
in the algorithm itself. The existence of limit-strategy with 8 —> 0 seems rather hard. Our 
construction would not directly extend to limit-strategy problem as it is heavily dependant 
on the fixed 8. 
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v 



Step 3 : Selectively Replace 


V y 



delay at l, 
go to B, 

< delay at l, 
go to A, 
go to A, 


0<x<0.5 

0.5<x<0.54 

0.54<x<0.9 

x=0.9 

0.9<x<l.l 


Figure 2 Optcost Computation for a Player 1 location (6 = 0.1): we can keep the guards as 
0 < x < 1 and not apply Step 3 for x G [1 — S, 1]. 
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Appendix 


A 


Cost Functions 



We illustrate the cost functions with an example. In the PTG given here, the cost function 
/ corresponding to the target gives the cost incurred when the target is entered various 
values of clock x. For example, if target is reached with clock value x = 0 then cost 
incurred is cost = 3 = /(0) while cost = 0 if entered with x £ [0.5,1]. Suppose B is 
entered with x = 0 and Player 1 decides to go to the target immediately with no delay at 
B (i.e; delay d = 0) then the cost is cost = 1 * d + f(v) = 1*0 + /(0) = 3, and x = v 
upon entering the target. However, if player 1 waited at B till x = 0.5 and then went to 
target, the cost is 1 * 0.5 + /(0.5) = 0.5. Similarly, if d = 0.75 then the cost is 0.75. From 
this, we can infer that the best strategy for Player 1 to achieve the optimal cost is to wait 
till x = 0.5 and then go to target. The second function labelled OptCost of B gives the 
optimal cost achievable for every value of x that B is entered with. Similar analysis for 
location A, reveals that the cost incurred is —1 if Player 2 went to target directly. Else, 
he could wait at A and then go to B. Due to the negative price at A, it is obvious that 
the best strategy for Player 2 is to go to B immediately. Thus, the optimal cost function 
for A is the same as that of B. 


B UndecidabilityProof 

We present below a set of figures which depict in full detail the simulation of all the 
instructions of two counter machine - increment, zero test and decrement. 

First we describe a few support modules that will be used in the main modules for simu¬ 
lating increment, zero test and decrement instructions. 

B.l Prevent perturbation module 

For correct simulation of the instructions, it will often be needed that the delay made by 
controller should not be perturbed by perturbator. The module in Figure [3] shows the con¬ 
struction that prevents perturbator from making any perturbation along the edge from A 
to B. In run p, the edge from B to the target ensures that if the delay chosen at A was 

Module : Prevent perturbation 

X=2 t/2 7 ^ 1 

A 

Figure 3 Prevent perturbation module: x is some clock and x = k could be the constraint for any 
k £ N. The triangle with 0 represents target location l with cost function cf(l) : R>o —> 0. 



perturbed then controller can achieve a cost 0. For better readability, we represent these 
unperturbed edges as dashed arrows as shown in path p'. We note that the clock which is 
used in the equality constraint, (x in Figure [3]) cannot be reset along the same edge. If we 
do not specify a clock that is being reset along the dashed edge, we consider it to be 1 / 2 ■ For 
any other clock, we show it as being reset along the dashed edge. Note that in the ‘prevent 
perturbation module’, we need at least one equality constraint (x = 1 in Figure [3]), thus 
ensuring a deterministic delay. 




















Guha, Krishna, Manasa and Trivedi 


17 


B.2 Choice module 

Since we consider a priced timed automaton and not a PTG, perturbator does not own a 
location from where it can suggest the successor location of its choice. We show in Figure [4] 
the construction of a module that allows perturbator to choose the successor location. The 



Module : Choice 


Figure 4 Choice module : Perturbator can choose to go to C 2 if he peturbs the delay at B by a 
positive value. If he does not perturb or perturbs by a negative value then goes to Ci. 



Module : mChoice 


Figure 5 mChoice module: The mChoice (modified choice) module is the same as the Choice 
module except for the fact that here the value of clock y\ is 1 upon entry. 

delay from location A to location B can be perturbed by perturbator. Controller chooses C 2 as 
the successor if the perturbation is positive, and chooses C\ as its successor if the perturbation 
is negative. We note that if the module was entered with = a.\ ,z = (3, X2 = 02 , Vi = yi — 0 
then upon leaving either L\ or L 2 the clock values are x\ = 3 + a±,z = 3 + (3,X2 = 3 + a 2 , yi = 
y 2 = 0. The mChoice (modified choice) module shown in Figure [5] is the same as the Choice 
module except for the fact that here the value of clock y\ is 1 upon entry. Thus the constraint 
on the edge between locations A and B is y\ = 2 instead of y\ = 1 as in choice module. Here 
also the value of clocks x\ , z and x 2 are increased by 3 as in the choice module while clocks 
y-[ and j / 2 have value 0 on exit. 

B.3 Restore module 

Both choice and mChoice modules add a shift of 3 to the clock values X\, a ; 2 and 2 . Since the 
main modules simulating increment and decrement of the counters expect the values to be in 
their normal forms, we need to remove the shift of 3; this is achieved by the Restore module 
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Figure 6 Restore module : This is actually a group of four modules. Restore^ 2 ^ 2 is shown in the 
figure. 


shown in Figure [G] The restore modules used in the main modules simulating the operations on 
counter C\ are a group of four different modules as mentioned below. Restore denotes 
the module used as part of the Increment module for counter C\. We also similarly have 
Restorep ec 2 module which is used as part of the Decrement module. The Restore D 1 ec 2 
module is similar to the Restorefl9 2 module with the only difference being that the clock 
constraint on the loop on C is z = 6 instead of z = 5 as in the Restore module. C 1 C 2 
here denotes that the fractional part of clock Xi is more than the fractional part of clock z. 
We also use Restore and Restore 1 to denote that the fractional part of clock 2 is 

c c 

more than the fractional part of clock x±. The Restorej 2 c 1 module can be obtained from 
Restore module by replacing all the occurrences of clock X\ with clock z and replacing all 
the occurrences of clock z with clock x\. Restore can also be obtained from Restore^ ec 2 
in the same way. The edge from location C to location D forces controller to take the loop 
at location C only once. The Restore^^ 2 and Restore modules are entered with clock 
values £1 = 3 + + £ 1 , z = 4 + + £ 2 > *2 = J/i = 2/2 = 0, at the starting location A of the 

module. At location E , the clock values are X\ = A + £\, z = ^ + £ 2 , x 2 = 2/1 = 2/2 = 0, i.e. 
restored to their normal form. 

The restore modules used in the modules simulating operations on counter C 2 are ana¬ 
logous. Corresponding to Restore^^: 2 , the delays at locations A, C and D are 1 — ^ — £ 1 , 
A + £1 — A — £ 2 and A + e 2 respectively, while the delays at locations B and E are 0. The 
value of clock z at the entry of the Restore and Restore is 5 + ^ + e 2 and the the 
clock values at the exit are as Restoref 2 ^ 2 and Restore < ^ 2< ^ 1 modules. 

We show below the main modules which are used for simulating zero test and decrement. 
We show here the modules corresponding to the operations on counter C\. The modules 
corresponding to the operations on counter C 2 are analogous. 

B.4 Decrement module 

The module simulating decrement of counter Ci is shown in Figure [7] Recall that by the 
normal form, the values of the clocks are x\ = p + £ 1 , z = 57 + £ 2 , 2/1 = 2/2 = x 2 = 0 at l\. 

1. Assume that Ci > 0 at l\. Controller can choose to goto B or D , since the constraints 
on both the edges are the same. If ci > 1, controller chooses to goto B , and if ci = 1, 
then controller goes to D. Consider Ci > 1, and controller visiting B. By the encoding, 
x\ = Tp- + £ 1 , i > 1, z = + £ 2 , x 2 = yi = 2/2 = 0. Here £1 and £2 denote errors 

accumulated so far in clocks X\ and z due to perturbation made by perturbator so far. 
Figure [ 8 ] shows the section of the module shown in Figure [7] starting from location B. This 
section simulates the decrement of counter C\ when the value of the counter is greater 
than 1. The value of clock z simulating counter C 2 remains unchanged. 

Let us denote the value of X\ at the entry of the module Decrement C 1 in Figure [ 8 ] i.e. 

+ £1 by a. Thus the delays at locations B and C are respectively 1 — a and a. On 
entry at D , we thus have x 2 = a, y\ = 0,y 2 = 1. A non-deterministic time t is spent at 
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Module : Zero test and Decrement ^ / 



*7 






|0>V 


XI = 1 A 2/1=0 




Figure 7 Zero test and Decrement Module: This module simulates the instruction If (ci = 0) 
go to lj else go to l'j. The extensions from B and D are shown in Figures pi and [9l respectively. 


D simulating the decrement of C\. Ideally, t must be 1 — 2a. Perturbator can perturb it 
by <5, where 5 can be both positive and negative and clock x\ is reset. On entering E we 
thus have X\ = 0, 2/1 = t + S, X 2 = a + t + 5. At the entry to mChoice module, the values 
of the clocks are x\ = 1 — t — 5, z = 2 + A- + e 2 , x^ = 1 + a, 2/1 = 1, 2/2 = 0. To correctly 
decrement C\ (whose value is i), 1 — t should be exactly 2a, i.e. + 2 £p 
Perturbator uses the mChoice module to either continue the simulation (by going to the 
Restore module) or verifies controller’s delay t. Due to the mChoice module, the clock 
values are X\ = 4 — t — S,z = 5 + ^ + e 2 , x 2 = 4 + a, 2/1 = 0 ,y 2 = 0. If perturbator chooses 
to continue the simulation, then the Restore module restores the clocks back to normal 
form and hence upon entering /' the clock values are x-\ = 1 — t — S, z = + £ 2,^2 = 

0, yi = 0 , 2/2 = 0. Thus, we have x\ = + 2ei — 5, where 2e\ — S is the value due to 

the perturbations so far. 

However, if perturbator chooses to verify, he first goes to yets another Choice module. 
If 1 — t > 2a, then the module Test Dec is used and if 1 — t < 2a, then the module 
Test Dec< is used. Note that due to the two Choice modules one after the other, the clock 
values upon entering Test Dec or Test Dec ^ are X\ = 7 — t — 5,X2 = 7 + a,y\ = 2/2=0. 
Test Dec^ 1 : At A’, on entry we have X\ = 7 — t — 8, X 2 = 7 + a, 2/1 = 2/2 = 0. A 
time 1 — a is spent at A! with accumulated cost 2 — 2.a^\ On entry to B, we have 
£ 1=8 — a — t — S, 2 /i = l — a. A time a is spent at B', and Xi = 8 — t ~ 5. A time 
t + 5 is spent at C', obtaining 2/1 = t + S. A time 1 — t — S is spent at D' obtaining the 
accumulated cost 2 — 2a+ 1 — t — S. The target is reached with this cost. If 1 — 2a > t, 
then this is > 2 — S. The perturbator can choose S < 0, making this cost > 2. 

2. Controller chooses the outgoing edge to D in Figure [7] if ci is 1 in which case the decre¬ 
mented value is 0 which is encoded by the exact value X\ = 1. The module from D has 
been shown in Figure [9] 

Figure [ 9 ] shows the section of the module of Figure [ 7 ] starting from location D. This 
section simulates the decrement of counter C\ when ci = 1 . Upon entering D, in the Test 
and Decrement module, the clock values are x\ = ^ + £1 ,z= ^7 + £2, =2/1 = 2/2 = 0 . 

Let a denote the value of xi, i.e. | + E\. The time elapsed in locations D,E and F in 
Figure [ 9 ] are respectively 1 — a, a and 1 . At the entry of the Choice module, the clock 
values are X\ = 1 ,2 = 2 + ^ + £2, X 2 = 1 + a, 2/1 = 2/2 = 0 . Here x\ encodes the counter 
value of Ci exactly and perturbator cannot perturb the delay made by the controller. 
Perturbator uses the Choice module to either continue the simulation or it can verify the 
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The price 2 on A can be replaced with 1, by having a slightly longer sequence of transitions 




20 


Revisiting Robustness in Priced Timed Games 


Module : Decrement Ci 


Xl / 1 

Ayi - =1/T\2/2=1/'~n2/i - VT\ 2A = 1 -'.2/1 = 0 - ' 

{2/2} {y 2 } '-,-- {CC 2 ) 2 / 2 }V-^'^ 


1 Tip ^ 7 . 1 

T apT.' T '~.£^0 


'0 


2 / 1=0 


^, ' T V_y 


{ ^esforeg 2 ^ 1 ) 


Y 

1 Choice ' 

/ —\ 

S> / 'A 
^ 

4 V V<? 


1 Test Dec 5 1 1 1 Test DecT '1 

^ ✓ \ z 


Test Dec^: 


j^r (jr jji 

2/!=0 2^2=8 /--X 2/1=1 / --X x 1= 8 /-7s 2/1=1 

- - - —Q- — 0 


Figure 8 Decrement Ci module : The section of the module shown in Figure [7] starting from 
location B. This section is used if ci > 1 before being decremented. It keeps the fractional part of 
clock 2 unchanged. The price 2 at A is a shorthand, and can be replaced with 1 on having a longer 
sequence of transitions. 


delay made by controller. Due to the Choice module, the clock values are x\ = 4 ,2 = 
5+ sr + e 2 > %2 = 4 + a, yi = 1/2 = 0. If perturbator chooses to continue the simulation then 
the Restore module restores the clocks back to the normal form and hence upon entering 
l'j the clock values are Xi = 1 , z\ = ^ + £ 2 , x 2 = yi = y 2 = 0 . 

However, if perturbator chooses to verify, he uses Test Dec^L^ module to verify whether 
controller chose this branch ( D ) of the Test and Decrement module when Ci = 1 or Ci > 1. 
Test Dec° : b 1 : On entry, we have X\ = 7, 2 = 8 + i + e 2 , x 2 = 7 + a, y\ = y 2 = 0. The 
delays at locations are: at A! : 1 — a obtaining yi = 1 — a on entering B'. A time elapse 
of a at B' gives x\ = a. Finally, at C', we elapse 1 — a. Thus the cost incurred in this 
module is 3 — 3a. For ci = 1, this is 3 — § — 2ei = 2 — I — 2ei, and the minimum cost 
when Ci > 1 is 3 — 3^2 — 2ei = 2 + j — 2ei. In the limit, as £1 tends to 0, the cost is < 2 
if controller chose the correct branch, that is, chose D when Ci = 1. 

3. Suppose controller chooses B instead of D when ci = 1. Then the value of clock X\ after 
simulating the decrement operation will not be exact, i.e.will not be equal to 1. Now, if 
the next instruction involving controller C\ is also a zero test and decrement operation, 
then controller will incorrectly move to l'j instead of lj while simulating this next zero test 
and decrement operation. For choosing B instead of D : controller will be punished while 
simulating this next zero test and decrement operation. Since the value of clock x\ is not 
1 , while simulating this next zero test and decrement operation, controller will either go 
to B or D in the module in Figure [7] 

_ If B is chosen, t should equal 1 — 2a for correct simulation. Now a being 1 + £ 1 , 
controller cannot delay for 1 — 2a at location D of Figure [8] and hence is punished. 

_ If controller goes to location D in Figure [7J when Ci = 0, then x\ = 1 + £1 = a then 
perturbator moves to the module Test Dec^L 0 . If £1 > 0, then the controller will get 
stuck in the transition from D to E (see Figure [9]) and if £1 < 0, then the module 
Test Dec^L 0 in Figure 9 incurs a cost of 2 — 2 ei > 2. The module Test Dec^L 0 can 
be drawn similar to Test DecxL 1 . 
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Module : Decrement Ci from 1 to 0 


xi ^ 1 


D 


E 


F 


Aj/i — 3/i=lx—\ xi — 1 , -\ ?/i=0 

->( 0 y - ->( 0 )->( 0 )-►' Choice < -► 

{ 2 / 2 } ^ { 2 / 1 } {^ 2 } 

1 2 / 1=0 

I 

V 

1 Choice 1 

" r —\ _ 

/ \ e£- 

// / \ ^ 

sO 

/ \ 

> A 

„Ci , ' n«^ c 1 



i Test Dec^lj ) > Test Dedfh 0 



Figure 9 Decrement Ci from 1 to 0 module : The section of the module shown in Figure 
[7] starting from location D. This section is used if ci = 1 before being decremented. It keeps the 
fractional part of clock 2 unchanged. 


B.5 Complete Reduction 

The entire reduction consists of constructing a module corresponding to each instruction /j, 
1 < i < n, of the two-counter machine. The first location of the module corresponding to 
instruction I\ is the initial location. We simulate the halting instruction /„ by a target location 
whose cost function assigns 2 to all clock values. We denote the robust timed automaton 
simulating the two counter machine by A, s is the initial state (7,0,0). 

► Lemma 17. The two counter machine halts if and only if there is a strategy a of controller 
such that limcostA(cr, s) < 2. 

Proof. We first consider the case when the two counter machine halts. Suppose it halts in m 
steps. The cost incurred in m steps can be due to reaching one of the target states in a test 
module or reaching the halt instruction in m steps. We consider an e such that 0 < 3 m <5 < e. 
In the first case, the cost is less than or equal to 2+2et,, where by Lemma [l 8 | £b < £ and hence 
the cost is 2 in the limit. In the second case, controller simulates the two counter machine 
faithfully and reaches the target location corresponding to the halt instruction and hence the 
cost is 2 in the limit. 

Now we consider the case when the two counter machine does not halt. Controller can 
simulate the two counter machine using the increment and the zero test and decrement mod¬ 
ules corresponding to each of the instructions. The cost is 00 if controller simulates the 
instructions faithfully and a target state is not reached. On the other hand, if controller 
makes an error, then it will be punished by perturbator in one of the test modules and cost 
will be non-zero. Hence the proof. ◄ 

Given an accumulated delay e, the accumulated delay after one step due to the decrement 
and the increment modules are 2e-\-5\ and e/2 + 5 2 respectively. The following lemma is from 

0 - 

► Lemma 18. Consider the two functions f : x —> 2x + 1 and g : x x/2 + 1. For any 
n> l,x > 0, and any / 1 ,... S {f,g},fi ° fi ° • ■ • ° f n (x) < 3 n x. 
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We note that the prices used in all the modules in our reduction are only {0,1} and hence 
we have the undecidability result as given by Theorem [T] 

C Proof of Lemma [Tl] 

We first map the states of the RPTG TZ and the dwell-time PTG Q. Let S(JZ) denote the 
set of states of the form (l, v) as well as (( l , v),t, e) of the RPTG and S(Q) denote the set of 
states of the dwell-time PTG. 

► Definition 19 (state map). We define a State Map / : S(TZ) —> S(Q) as follows 

_ if l is a controller(perturbator) location then f(l, v) = (l, v) as all controller locations of 
7 Z become player 1 locations in the dwell-time PTG Q, and all the perturbator locations 
of TZ become player 2 locations in the dwell-time PTG 

_ Recall that the RPTG had states of the form ((l,i/),t,e) corresponding to perturbator 
states (after controller chose a time delay and edge, perturbator decides the perturbation). 
Recall also that for every controller location l , and corresponding edge choice e made in 
the RPTG 7 Z, we had the urgent player 2 location (l,e) immediately following the player 
1 location l in the dwell-time-PTG Q constructed. That is, f((l,i/),t,e) = ((/, e), i/ + t — 5) 

Note that f(s) is a unique state in Q. 



RPTG TZ 


{A, e) 



[S,2S\ 


Figure 10 Transitions of RPTG 1Z mapped to transitions in the constructed dwell-time PTG Q 


► Lemma 20. Given a path p in 1Z from s to s', there exists a unique path p' in Q from f(s) 
to f{s'). Additionally, Cost(p) = Cost(p'). 

The proof is quite straight forward and follows from the structure and the state map defined 
above. 

Next, given a strategy cri in 7 Z, we shall define an equivalent strategy <r[ in Q in terms 
of the moves proposed. Let e be the edge from l to l' in 1Z. We map a(p.s) = ( t , e) to 
ct'(p'./(s)) = (t',e') as follows 

1. Controller strategy mapped to Player 1 strategy : 

The strategy cr 1 (p.(Z,^)) = (t, e) in 1Z leads to the state Ul,v),t,e). Corresponding to 
this, we have a[{p'. f (l, v)) = (t',e') such that t! = t — S n and the player 1 location l 
moves into the urgent player 2 location (/, e). This leads to (\l,e),v+ t — 8). e' is the edge 
in Q between l and (l,e). Recall also that the time delay t in TZ has been mapped to the 
time delay t — 5 in the constructed PTG Q. 

2. Perturbator strategy mapped to Player 2 strategy for perturbator locations: 

A strategy a 2 {p\l,v)) = (t,e) in 7 Z leads to (l',v + t[r := 0]). Correspondingly, we have 
in 0, a' 2 (p'.(l, v)) = ( t , e), giving the state (V, v + t[r := 0]) in Q. 


3 


t > S in the TZ due to robust semantics 
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3. Perturbator strategy mapped to Player 2 strategy for new locations : 

Recall that we have f((l, u), t, e) = ((l, e), v +t — 8). 

If we have the strategy cr 2 (/>.((Z, v),t, e)) = e £ [—<5, 5] in TZ such that 

- if 0 < e < 8 then a' 2 (p'.((l, e), v + t — 8)) = ( 0 , ep) which results in ((l, e) + , v + t — 8) 
and a' 2 (p' .((l, e) + , v + t — 5)) = (5 + e, epo) resulting in (l', v + t — 8 + 8 + e[r := 0 ]). 

- if — <5 < e < 0 , let e' = —e then a 2 (p'.((l,e),u + t — 5)) = (0,en) which results in 

p'.((l, e),is + t — 8) e -> ((l, e) , v + t — 8) and a 2 (p'.((l, e ) ,v + t — 5)) = (5 — e', eno ) 

resulting in p'.((l, e)~, v + t — S) 5 - £ ’ en °> (l', v + t — 8 + 8— e'[r := 0]). Note that on 
entering (l, e) with a value v + t — 8, a time in e € [0, <5] is spent at (l, e ) , obtaining 
a valuation v +1 — 8 + e. This corresponds to altering the time t spent by controller in 
1Z to a value t — 8 + e £ [t — 8, t]. 

Similarly, given a strategy a' in Q , we shall construct the equivalent strategy a in 1Z as 
follows. 

1. Player 1 strategy to controller strategy If cr^(s) proposes a delay t then tj 1 (/^ 1 ^(s)) 
proposes a delay t + 8. 

2. Player 2 strategy to perturbator strategy in perturbator locations If a 2 (s) pro¬ 
poses a delay t then ct 2 (/^ _ 1 H s )) a l so proposes t. 

3. Player 2 strategy to perturbator strategy in controller locations Suppose cr 2 ((l, e), v+ 
t) suggests the path ((l,e) + ,u, c) -^4 (/,;/). Then, <J 2 ((h v), t + 8, e) A- (l',v'). 

► Lemma 21. In the RPTG TZ given in Figure \TT\ if g is 0 < x < 1 then B is reached with 

x £ [0,1 + . In the corresponding PTG Q too, B is reached with x £ [0,1 + 5]. We can 

establish the same for other possible guards too. 

► Lemma 22. Cost(s A s') = Cost(f(s) °A f(s 1 )). That is, the cost of a transition from s 
to s 1 in the RPTG TZ is the same as the cost of going from f(s) to f(s') in the dwell-time 
PTG Q. However, we need multiple transitions to reach from f(s) to f(s'). 

Both the above lemmas follow from the definition of rf and the delays adjusted over l, (l, e)~ 
and ( l,e) + in the PTG Q. 

► Lemma 23. Given a strategy o\ in TZ and the corresponding strategy a[ in Q, for every 
state s in TZ, Cost(s,t Ti) = Cost(f(s), a[). 


Proof. Recall that Costas, 0 i) = sup CT 26 strat 2 ( 7 ?.)(C o st-R,( 0 utcome(s, ° 2 )))- 

Part 1: Cost- 7 ^(s, cr^ ) < Costg(/(s),<r') 

Consider a strategy ct 2 in TZ. We can construct a strategy <j 2 in Q as outlined above. From 
Lemma 22 it is clear that the CostK(Outcome(s, cti, ct 2 )) < Costg(Outcome(/(s), a[, cr 2 )). 

Part 2: Costg(/(s), cr^) < Cost- 7 j.(s, cr\) 

Consider a strategy a 2 in Q. We can construct a strategy cr 2 in Q as outlined above. The 
selected semantics of Q and Lemma 21 ensure that all of a 2 proposed delays can be emulated 
in TZ too. ◄ 


Along the same lines as the lemma above, we could also prove that Cost(s, tr 2 ) = Cost(/(s), a 2 ). 
These two results pave the way for relating the optimal costs for states in the two games. We 
shall establish OptCost^(s) = OptCost g(f(s)) by proving two inequalities 

(1) OptCost^(s) < OptCost g(f{s)) and 

(2) OptCost g(f(s)) < OptCost^(s) 
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OptCost^(s) < OptCost g(f(s)) 

Consider a strategy a± in TZ and construct an equivalent strategy <r[ in Q (this is possible, 
Lemma |20j ) . Now we shall prove that 

s«P<r 2 £ S tr a t 2 (w)(Cost(0utcome(s,£7i,o-2))) = sup CT /, eSt rat 2 (e)(Cost(0utcome(/(s),(j(,CT2))). 


To this end, let us consider a perturbator strategy ay in TZ. Then we can construct an equi¬ 
valent Player 2 strategy a 2 such that Cost(Outcome(s, oy, oy)) = Cost(Outcome(s, a[, a 2 )) 
(follows from Lemma 221. Thus, we have shown that the set of strategies in Q is at least as 
large as those in TZ and whatever costs are achieved in 7Z can be achieved in Q too. 


OptCostg(s) < OptCost^(s) 

We shall now construct strategies in TZ from strategies in Q. If a\ (s) proposes a delay t then 
<ri(/( _ 1 )(s)) proposes t + S. Lemma [ 2 l] ensures that t + S will satisfy the guard. For example, 
if the guard was 0 < x < 1 in TZ then the delay chosen by is < 1 — v{x) — S. 

Similarly, we construct strategy ay from a’ 2 as specified above. If <J 2 ((l, e), v + t) suggests 
the path ((£, e) + ,i/) (£' ,v'). Then, cr 2 ((^ v), t + 6, e) A- (V,v"). We know that, v" = v'. 

Once again, Lemma [2T| ensures that if v' is in an interval I then v" g /. For example, for the 
guard 0 < x <1, 1 /, v" e [0,1 + 5}. 

Once we have mapped the strategies, the proof of OptCostg(s) < OptCost^(s) is along 
the same lines as the previous case. 

► Lemma 24. if a 1 in TZ is (e, N) — acceptable then a[ in Q is also (e, N) — acceptable. 

A strategy in TZ is said to be (e, N)— acceptable if (1) it is memoryless, (2) is e—optimal for 
every state and (3) partitions [0,1 + <5] into at most N intervals. 

From the definition of equivalent strategy <r(, it is easy to see that if 01 is memoryless then 
so is a[. Additionally, if ay has n intervals then would also have n intervals except that 
the intervals’ end points would be shifted by S as the delay prescribed by a[ are t — S when 
ay suggests t. Finally, we can claim that e—optimality is preserved from Lemma |23| 


D 


Dwell time PTG to Dwell time FRPTG 



■ Figure 11 FRPTG 


We have already defined in section |5.2[ the mapping between the states of the dwell-time 
PTG Q and the constructed dwell-time FRPTG Qjr. The mapping is defined in such a way 
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that a state (l, v) in Q is mapped to the state (lb, v — b), whenever v £ [b, b + 1]. the integral 
part of the clock valuation is remembered in the state itself, while the valuation always stays 
in [0,1]. The only exception to this is the location (l,e)b + , where the clock valuation can go 
up to 1 + 5 .The state ((l, e)b + ,v) in Qjr is the mapping of the state ((/, e) + , b + v) in Q. 

► Lemma 25. Given a path p in Q from s to s', there exists a unique path p' in Qjr from 
f(s) to f(s'). Additionally, Cost(p) = Cost(p’). 

The proof of Lemma [25] is straightforward, given the mapping /. Any time elapse of 1 
in any one state (l,v) in Q is captured by starting from some (lb,v — b), and moving to 
(lb- )-i, (z/ + 1) — (6 + 1)) in Qjr and so on. Whenever the clock value reaches an integral value 
in Q, correspondingly in the Qjr, the state is updated by remembering the new integral part, 
and updating the clock valuation to 0. Every path in Q corresponds to a path in Qjr, where 
the constraints on the path are shifted by an appropriate integer, depending on the integral 
value remembered in the current state. 

This also gives a mapping between the strategies of Q and Qjr. Also, the costs are preserved 
across paths : any path in Q is mapped to a longer path in Qjr so that the individual time 

delays in Qjr never exceed 1. Since the prices of states are preserved by the mapping, the 

costs will add up to be the same. It is easy to see that a copy-cat strategy works between 
Q and Qj?, and hence, costs, optimal costs are preserved. Since strategies are copy-cat, all 
properties like (e, .^-acceptability are also preserved across games. 

E FRPTG to Reset-free FRPTG 

Given a one clock FRPTG Qjr = (L\, L 2 , {x}, X, 77 , T) with n resets (including fractional 
resets), we define a reset-free FRPTG as follows : Qjr' = (L' X ,L'^, { x} ,X',r)',T') where 

h For l £ L 1 and 0 < j < n, we have P £ Li x \ 

_ For l £ L 2 and 0 < j < n, we have P £ L' 2 ; 

_ S qL L\ U L 2 is a sink location such that S £ L' 2 ; 
m X' has the following transitions. 

_ P i>3 if 1 4 I' £ X- 

_ P /b-t - 1 if 1 XLf 1 ' £ x, j < n and r is either {x} or [x]:= 0 ; 

_ l n S if l V £ X and r is either {a?} or [cc] :=0; 

- S -> s- 

_ rf(p) = v (l) and P £ T' if l £T. 

We illustrate the construction of a resetfree FRPTG in Figure [13] corresponding to the 
FRPTG in Figure [12] Note that the locations in the upper rectangle form the the first copy 
Qjr -0 and while the lower rectangle forms the second copy Qp-l. A copy Qp-i indicates the 
number of resets seen so far from the initial location l § of the first copy Qjr- 0 . 

E.l Proof of Lemma H3l 

Proof. Consider any state (l, v) in Qjr. The reduction from the FRPTG Qjr to the reset-free 
FRPTG Qjr' creates a new component (or copy) for each new reset, including fractional resets. 
Given that there are a total of n resets in the FRPTG, Qjr , n + 1 reset-free components are 
created in the reset-free FRPTG Qjr', and the last component goes to a location with cost 
+ 00 . By assumption, the cycles in each reset-free component are non-negative. Any cycle in 
the FRPTG Qjr involving a reset is mapped to a path in the reset-free FRPTG Qjr' ending 
at the location S with cost + 00 , while any reset-free cycle in Qjr is mapped to a cycle in one 
of the n- 1-1 reset-free components of the reset-free FRPTG Qjr'. Clearly, for every strategy a 
of player 1, 2 in Qjr , there is a corresponding strategy c^ , in the Qjr’ and vice-versa, obtained 
using the above mapping of paths between Qjr and Qjr . Given that the prices of locations are 
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■ Figure 12 FRPTG 


preserved between Qj= and Qj -, the optimal cost from (l, v) in Qjr is the same as the optimal 
cost from (Z°,i/) in Q? . 

Consider a (e, iV)-acceptable strategy a' in Qjr'. Consider a winning state (1°, v). Let i be 
the minimum number of resets from state (Z°, v) along any path compatible with a'. That is, 
the player can win from v) but not from (l n ~ l+1 , v). If (l, v) is not winning then we take 

i = n+ 1. We denote by <r' n _ i the suggestions made by a' in the n — i th copy in Qjr'. We then 
assign <j{1,v) = a' n _ i {l n ~ % ,v). Thus, we obtain that Costg_ F ((Z, v), a) = Costgy ((Z n_I , u), a'). 

Since ( l°,u) and (l n ~ l ,v) have the same outgoing transitions, we know that the strategy o' 
from ( l°,v) will be atleast as costly as 0ptCostg^./(7 n- *, v). That is, 

Cost e ^/((7 0 ,j/), cr') > OptCost g ^,(r _l ,i/) (1) 

Now, if Cost v),c t') > Costp jr /((Z°, v), a')+e, then by Equation[l]we have Cost gy((7 n_ *, v),cj') > 

OptCost^^./ v) + e which means a' is not e-optimal. Thus we have, 

Costg^'((7 n- *, u),a r ) < Costp^'((Z°, u), a 1 ) + e 

(< Cost gy((l 0 ,i/),a') + e 

Costg^((Z, v),a) = Cost v),a') < < OptCost e ^,(Z°, v) + e + e 

[< OptCost e ^.(Z, v) + 2e 


< 

We shall now focus on informally explaining why fractional resets would not cause a 
problem. In a PTG without fractional resets, a resetting transition e (say l —— - °> m) taken 

twice takes us back to the same state (m, x = 0) twice. This crucial property is the back bone 
of the transformation which removes resets in [5\ The correctness proof is by constructing 
optcost preserving strategies for Player 1 in both Q and its resetfree equivalent Q'. Given a 
winning strategy for Player 1 in PTG Q, a strategy in Q' is constructed so as to ensure each 
resetting transition is taken atmost once. This is possible because a resetting transition e 
appearing the second time, results in the same state (m, 0) and hence the transitions possible 
(and the optcost achievable) from the second resultant state (m, 0) can be applied the first 
time this state occurs itself. In other words, the second occurrences of the transition are 
replacable as they result in the same unique state (m, 0). It should be the case that a path 
exists such as to avoid the second occurrence of the resetting transition as the strategy is 
winning for Player 1. 

Now, a similar reasoning will not work for fractional resets e! (say V —> m!) as the 

resulting state (m 1 , x ) after a fractional reset transition is not unique (as the clock x £ [0, 5]) 
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Reset free FRPTG 

Figure 13 Resetfree FRPTG - two copies Qt — 0 and Qt — 1 correspoding to the number of resets 
encounterd so far i.e; Qj: — 0 indicates that 0 resets have been seen so far and Qj: — 1 indicates 1 
(fractional) reset has been seen. 


and thus we can not adopt this argument directly. Firstly, the player 2 location (Z,e)i*~ is 
entered with x < 1 — S (see Transformation 2) and a delay d makes x £ [1,1 + 5]. This delay 
happens entirely in this location and is chosen entirely by Player 2. From (Z, e) 7 + . if player 2 
moves to a (Z, e)j+i° location, then the value of x is in [0, <5] . Note that the value of x, say £ 
in (Z,e)i_f_i is indeed the perturbation that happened in the RPTG 1Z : in the FRPTG, at 
(Z, e)^~, player 2 elapses 5 + f. Recall that if in 7 Z, a location l was entered with value of x 
being //, then in the FRPTG, we enter (Z, e)» and (l,e)+ with v — i — S. Player 2 at (l,e)+ 
makes this value to be v — i + which is exactly same as the perturbed value of x in TZ 
when perturbator chooses a positive perturbation. The point to note is that whevener player 
2 returns to (Z, e)^~, the control of perturbation is his; thus, any £ that is achieved the Zctli 
time can be achieved the first time itself by player 2. Moreover, if player 2 has a strategy to 
revisit (Z, e)f, then clearly, player 1 will lose, since after n + 1 times, the control reaches the 
target with cost oo. Note also that in Algorithm 1, while we solve for (Z, e)+, we will have the 
optcost function computed for (Z, e)i+i°. Player 2 will choose to delay S + £ for that ( where 
the cost is maximal in the optcost of (Z,e)j_(_i . 


F Example : Solve Reset-Free FRPTG 

We shall first look at how normal resets are handled. As detailed by the resetfree construc¬ 
tion, each copy of the FRPTG is an SCC and there are n + 1 copies when the FRPTG has n 
resets. The i + 1th copy is solved and its optcost functions are used as outside cost functions 
while solving the ith copy. This is depicted clearly in the figure below. 
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1+5 


■» x 


Location L in Gn has two transitions with resets - one to a target and another to a location 
M in Gn+i whose optcost function has already been computed. Due to the clock reset, the 
target or M are entered with clock value x = 0 and hence the only values of interest are : the 
cost function / of the target at x = 0 i.e; /(0) = 1 and optcost for location M at x = 0 i.e; 1.2. 
Since L is a Player 1 location, the lower of these two values is picked and the corresponding 
transition is selected. 


Now let us now consider fractional resets. The following figure depicts the previously 
conidered example with normal reset replaced with fractional reset. 



-» x 


r, t , , ,o Cost of taking a 

OptCost of (A, e) b+1 /rom(Ae) + 


Recall from Transformation 2 (Dwell-time PTG Q to FRPTG Gf) that fractional resets 
occur only along transitions from (A,e) b to {A,e) b+l . Lets call this transition ei. From the 
construction of FRPTG, we also know that the only other transition possible from (A, e)jJ” is 
to location B b , corresponding to the transition from A to B in the RPTG. Let us denote this 
transition as e-i. by construction of the reset-free FRPTG, the constraint on (A,e) b —> Bj, 
is x < 1. Figuring out which part of the cost functions of Bb and {A, e) 6+1 to consider for 
the optcost computation of (A,e) b is a little different from the normal reset case. Here the 
guards on transitions can be considered as 0 < x < 1 for e 2 and 1 < x < 1 + 8 for ei. 

Hence we should consider the entire cost function of B &, while taking only the x £ [0, 5] 
part from the function of (A, e) b+l . Recall that fractional resets removed the integer part of 
x , thereby taking x from [1,1 + (5] to [0, $]. Thus the cost function of taking the transition to 
(A, e)° +1 is equal to (delay of waiting at (A, e) b till x = v £ [1,1 + 5]) + (optcost of (A, e)° +1 
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at 1 — v). We compute this cost as outlined in Figure [l5| It is easy to see that since the price 
of ( A , e)£ is 1, and the slope of the optcost function of ( A , e)” +1 is —1.2, it is more profitable 
to reach (A,e) a b+l at x = 0 than at x = S i.e; the transition e\ when x = 1, thus reaching 
(d, e)°, j at x = 0 after the fractional reset, rather than wait at (A, e)^ till x = 1 + S and 
then reach (A, e) ° +1 at x = S yielding only 2.16 (wait till x = 1 + S incurring 1.2 and then 
optcost of (A,e)° +1 at x = 6 is 0.96). 

We consider this cost function of taking the transition e\ and the cost function of Bf, to 
compute the optcost function of (A,e)^. In this example, it is clearly better for Player 2 to 
take e\ at x = 1 and hence the cost function of taking e\ is the optcost of (A, e))j~. 


G 


Algorithm for OptCost Computation : / is a player 1 location 


We first prove Lemma [I5| 

Proof. The optcost computation for a location l m i n is done using the already computed 
optcosts of all successors of l m i n , which we now treat as outside cost functions. The Steps 1 
and 2 in Algorithm [T] superimpose the outside cost functions corresponding to / min and take 
the interior. Recall that step 3 is applied right to left : we start the selective replacement from 
v = 1 + S and proceed towards 0. We know that up to Vi , for all v > 'ty, OptCost(l mi „, v) = 
/(«*)■ 

Now we have to compute the optcost for v £ [iq, Vi\. As we have taken the interior of the 
superimposed function in Step 2, we know that gi is the best (lowest) possible cost if we do 
not delay at l m in- Let us determine if delaying at l m in is more profitable than following gi. 
The two options we have are : 

1. Follow gi whose slope is —to. The line segment g t is given by y = —mx + c where 
c = f(i’i ) + mvi, since y = gi(vi) = f(vi) at x = v^, (/ is continuous, and is composed of 
9i, ■ ■ -,9m) and 

2. delay at l m i n till x = v* and exit at Vi. The line segment corresponding to the delay at 
lmin is y = -r](lmin) * x + c' where d = f{vi)+r]{l m i n )*Vi as we delay at Z min until x = v t 
and follow /, thus obtaining f{vi) at x = Vi 

Now comparing these two equations we get the following. 


— mx + c ^ 

J - V(lmin ) * X + d 

mx + fivi) + mvi 

° * X f{vi) T 9{lmin) * ^i 

—mx + mvi r. 

J - V(lmin ) * X + TJ * Vi 

m * (vi — x) 

* T)(lmin) * (Vi - X) 


If x < Vi and r/(l m in ) < to, then we conclude ~ is > 

Thus, we observe that delaying at l m i n is better. The above discussion is for Player 1 but 
can be easily adapted to Player 2. In a similar fashion, we can argue that delaying at l m in 
till x < v'i < Vi is worse than delaying till x < Vi i.e; Player 1 prefers to wait until v t instead 
of exiting and following gi at some point v[ < Vi. ◄ 


G.l OptCost Computation for All Constraints 

In the computation in Algorithm [l] we have assumed that all the transitions from l A- l r have 
a guard 0 < x < 1. We shall now illustrate how to compute optcost of l if the guards on the 
outgoing transitions are different. 

While optimal strategies are possible with closed constraints, it is known that optimal 
strategies need not exist with open constraints. 
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Figure 14 Optcost Computation for guard 0 < x < 1 


Constraints on all the outgoing edges are 0 < x < 1 


We shall illustrate how to obtain e—optimal strategies with open constraints. Consider the 
Figure 14 Here the guard is 0 < x < 1 and clearly the OptCost e (7, 0) = 2 and there is no 
strategy to achieve that. Hence we want to find the e-optimal strategy achieving < 2 + e. 
Pick t — —— e , 1 where m max is the slope with the largest absolute value seen among the 
outside cost functions. Here m max = 3 (OptCost^ is y = —3x + 3). Let e = 0.1. Then 
t = 0.025. Now lets fix the strategy to wait at l till x < 1 — t and go to A at x = 1 — Then 
OptCost e (7,0) = 2 * (1 — t) + /(I — t) where / given by y = —3x + 3 is the optcost function 
of A. Thus OptCostg(7, 0, 0) = 2.025 < 2 + 0.1. Extending this to several successors of l is 
simple and follows all the steps of Algorithm |T] At 1 — t, take the transition to the location 
prescribed by f in Step 4. Note that this method would work for 0 < x < 1 — S by simply 
replacing 1 with 1 — S in the discussion above. 


Constraint 1 — S < x < 1 

In transformation 1 from RPTG to dwell-time PTG, we replaced the constraint H < x < H +1 
by H — 5<x<H+l — S. Such a constraint would correspond in the resetfree FRPTG to 
1 — 5<a;<lor0<a;<l — 5 . We have already dealt with the constraint 0 < x < 1 — 5. 
Now, we shall highlight the difference to make it work for 1 — S < x < 1. We shall compute 
as usual, by applying the steps of Algorithirfl] and also get the prescribed strategy out of the 
final function /'. Now if the computed strategy <r for l suggests to take a transition in the 
interval [0,1 — 5] then instead of this transition we prescribe waiting at l. This is because the 
guard on the outgoing transition(s) isl — S < x < 1. The rest of the strategy prescribed by 
f over (1 — 5,1] is retained as is. 


Constraints on outgoing edges are a; = 0, 0<x<l, x — 1 

Figure [15] explains how to solve for optcost if the outgoing transitions have different guards. 

As Player 1 can go to A only if x = 0, we need to consider only that point of OptCost(A) 
while computing the optcost of l. Similarly, player 1 can go to C only when x = 1. Thus 
the function to consider, for taking the transition to C is the cost of the path (or action) of 
delaying in l till 0 < x < 1 and going to C at x = 1. Upon reaching C at x = 1, the cost 
incurred will be OptCost(C”, 1) = 0.5. Delaying at l at the rate of rj(l) = 2 yields a function 
with slope —2 passing through the point (1, 0.5) (corresponding to going to G at x = 1). 


n* 


4 


If there are n transitions in the longest path from source to target then t = 


(rrimax + l) ■ 
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transitions from l, y(l) = 2 




Figure 15 Optcost Computation for different guards 


■ l mm is a player 2 location 

Here we discuss in detail how to take care of the dwell-time requirements while running 
Algorithm [T] Recall that there are three kinds of player 2 locations : urgent, those with 
dwell-time requirements [0,5] and those with dwell-time requirements [5,25]. 


1. 

2 . 


we know that Player 2 will want to spend as 


Urgent location : superimpose and take exterior (only Steps 1 and 2). 

[0, 5]-delay location : From Lemma 15 
much time as possible at a location l m i n while keeping x < ry whenever there is a function 
< 7 i over [ui,Vi] whose slope is > —r)(l m i n )- Note that we proved Lemma 15 for player 1, 
however, an analogous result works when Z min is a player 2 location. 

Thus, if l m in is entered at x = v £ [uj,ty — 5], then player 2 spends 5 time and exits 
(as 5 is the maximum delay permitted in l m in by the dwell-time restriction) at v + 5 to 
the successor as prescribed by / at x = v + 5. If l m in is entered at x = v £ [iy — 5, iy], 
then player 2 spends iy — v at l m in and exits at ry to the successor as prescribed by / at 
Vi. In the superimposed optcost function /, a function <jy : y = —rnx + c having domain 
[rti,iy] with slope less than —r](l) is replaced as follows : alter gi from y = —mx + c to 
y = —m(x + 5) + c + rj(l ) * 5 = —mx + c + (??(/) — m) * 5 for x £ [iq, — 5]. Let us denote 
the new function as hi over the domain [ui,Vi — 5]. This corresponds to spending 5 time 
until x < Vi- 

When x £ [ty — 5, ty], then Player 2 spends the time v % — x at l m i n before proceeding, as 
prescribed by / from ry onwards. Thus the function obtained by replacing gi for this range 
[vi — 5,iy], h'i is y = —rj(l m i n )x + d , and passes through the point (iy,/(iy)). However, 
h[ should intersect with hj at iy — 5 to make the resulting improved optcost function 
continuous (and thus usable by the predecessors of l m in)- We shall show that the line 
passing through the two points (iy — 5, hi(vi — 5)) and (ty, f('Vi)) has a slope —to' = —y(l)- 
We have , the original cost function, and hi, and we know that from ry onwards, Player 2 
has to continue with the optcost as dictated by / (the superimposed function). Thus we 
know that from the point (ry — 5, hi(yi — 5)) of the new function hi, the optcost will proceed 
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towards the point (w»,/(«*)) (recall that gi(vi) = /(u,)). Thus given these two points, we 
find the line h! i = —m'x + d as follows. 


—m 


f 


f(vj) ~ hi(vj - 8) 

Vi - (Vi - 8) 

[—m * Vi + c] — [—m(vi — 8) + c + (f)(1) — m) * 5] 
6 

— f)(l) * S 


6 


-v(l) 


Similarly, we also find d by using h( = —m'x + d where slope is —m' = —f)(l) and this 
line passes through the point (uj,/(vj)). 

f(vi) = * Vi + d 

—m * Vi + c = —rj(l) *Vi + d 

d = c + (f)(1) — m) * Vi 

3. [5,2<5]— delay location : For every function g, in / (the superimposed function) of Step 
2, we first apply the modification of always spending 5 delay at l m in- This is achieved by 
changing it from y = —mx+c to y = —m(x+8) + c+r](l)*8. The domain of g. t also changes 
from [ui, Vi] to — 8, Vi — <5]. Thus the entire superimposed function / has been modified 
to /' (lets call it the adjusted superimposed function). After this, proceed with l min 
as though it were a [0,5]-delay location while taking /' to be its adjusted superimposed 
function. 


H. l Complexity and Termination when l min is a player 2 location 

Computing Almost Optimal Strategies: The strategy corresponding to computed optcost 
when l m in is a player 2 location is derived as follows. 

I. lmin is urgent. In this case, we simply do steps 1,2 of the algorithm, superimpose and 
take exterior obtaining the function /. For x £ [u/-. Vk\ , the strategy will dictate moving 
to location /*,, since gk is the optcost function over the domain [tife, Ufe] of the successor l *. 

of lmin- 

2. l m in is a [0, <5]-dwell time location. If x £ [it,;, Vi — S] and the function is hi, the strategy will 
prescribe waiting at l m in for 8 amount of time and then proceed to k whose cost function 
is gi, the one replaced by hi. If x £ [u,, i>, — d] and the function is gi (not replaced at Step 
3), then the strategy suggests going immediately to 1, whose cost function is g-,. Finally, 
if x £ [vi — 8, Vi] for functions h(, we prescribe waiting at l m in fib — x. 

3. lmin is a [<5, 2<I]-dwell time location. The strategy prescribes waiting for 8 time at l m i n , 
and then uses the strategy prescribed above for [0, (5]-dwell time locations. 

We have already discussed the complexity of Algorithm 1 in computing the optcost func¬ 
tion for lmim an( l the almost optimal strategies when is a player 1 location. Now we 
discuss the case when Z m ,; n is a player 2 location. 

Assume Z m , n is a player 2 location. Let a(m,p) denote the total number of affine segments 
appearing in cost functions across all locations. We handle this case by making Z m ,„ urgent 
and solve the modified PTG Q' (where l m in is urgent) which has one location less and then 
uses the computed optcost functions as outside cost functions to solve for l min itself. This can 
be repeated as the optcost computed in Q' could get updated when the optcost cost of l m in 
itself is computed. This process gets repeated as many times as the number of segments we 
started with i.e p. Thus the equation is a (to, p) < p.(l + a(m — l,p+l)) where a(m—l,p+l) 
is the number of segments used for solving Q '. 1 + a(m — l,p + 1) is the number of segments 
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used for solving for l m i n and p( 1 + a(m — 1 ,p+ 1)) are the repetitions. Solving this, one can 
easily check that a(m,p) is at most triply exponential in the number of locations m of the 
resetfree component Qp. Obtaining a bound of the number of afhne segments, it is easy to 
see that Algorithm 1 terminates; the time taken to compute almost optimal strategies and 
optcost functions is triply exponential. 


